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GENERIC IDENTIFIABILITY OF LINEAR STRUCTURAL 
EQUATION MODELS BY ANCESTOR DECOMPOSITION 


MATHIAS DRTON AND LUCA WEIHS 


Abstract. Linear structural equation models, which relate random variables 
via linear interdependencies and Gaussian noise, are a popular tool for model¬ 
ing multivariate joint distributions. These models correspond to mixed graphs 
that include both directed and bidirected edges representing the linear rela¬ 
tionships and correlations between noise terms, respectively. A question of 
interest for these models is that of parameter identifiability, whether or not it 
is possible to recover edge coefficients from the joint covariance matrix of the 
random variables. For the problem of determining generic parameter identi¬ 
fiability, we present an algorithm that extends an algorithm from prior work 
by Foygel, Draisma, and Drton (2012). The main idea underlying our new 
algorithm is the use of ancestral subsets of vertices in the graph in application 
of a decomposition idea of Tian (2005). 


1. Introduction 

It is often useful to model the joint distribution of a random vector X = 
(Xi, in terms of a collection of noisy linear interdependencies. In particu¬ 

lar, we may postulate that each Xy^ is a linear function of Xi, X^ 
and a stochastic noise term eyj. Models of this type are called linear structural equa¬ 
tion models and can be compactly expressed in matrix form as 

(1.1) A = Ao+A'^A + e 

where A = {Xvw) is a n x n matrix, Ao = (Aoi, Xon)'^ G R", and e = (ei,e„)^ 
is a random vector of error terms. We will adopt the classical assumption that e 
has a non-degenerate multivariate normal distribution with mean 0 and covariance 
matrix = (uiyw)- With this assumption it follows immediately that X has a 
multivariate normal distribution with mean (I — A)“^Ao and covariance matrix 

(1.2) E = (/-A)-^fl(/-A)-i 

where I is the n x n identity matrix. We refer the reader to the book by Bollen 
(1989) for background on these types of models. 

We obtain a collection of interesting models by imposing different patterns of 
zeros among the coefficients in A and ft. These models can then be naturally 
associated with mixed graphs containing both directed and bidirected edges. In 
particular, the graph will contain the directed edge v ^ w when Xy^ is not required 
to be zero and, similarly, will include the bidirected edge v w when coyw is 
potentially non-zero. Representations of this type are often called path diagrams 
and were first advocated for in Wright (1921, 1934). 
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A natural question arising in the study of linear structural equations is that of 
identifiability; whether or not it is possible to uniquely recover the two parameter 
matrices A and from the covariance matrix E they define via (1.2). The most 
stringent version, known as global identifiability, amounts to unique recovery of 
every pair (A,f2) from the covariance matrix E. This global property can be char¬ 
acterized efficiently (Drton et ah, 2011). Often, however, a less stringent notion 
that we term generic identifiability is of interest. This property requires only that 
a generic (or randomly chosen) pair (A, fl) can be recovered from its covariance 
matrix. The computational complexity of deciding whether a given mixed graph 
G defines a generically identifiable linear structural equation model is unknown. 
There are, however, a number of graphical criteria that are sufficient for generic 
identifiability and can be checked in polynomial time in the number of considered 
variables (or vertices of the graph). To our knowledge, the most widely applica¬ 
ble such criterion is the Half-Trek Criterion (HTC) of Foygel, Draisma, and Drton 
(2012a), which built on earlier work of Brito and Pearl (2006). The HTC also 
comes with a necessary condition for generic identifiability but in this paper our 
focus is on the sufficient condition. We remark that an extension of the HTC for 
identification of subsets of edge coefficients is given in Chen et al. (2014). 

We begin with a brief review of background such as the formal connection be¬ 
tween structural equation models and mixed graphs, and give a review of prior work 
in Section 2. In the main Section 3, we will demonstrate a simple method by which 
to infer generic identifiability of certain entries of (A, H) by examining subgraphs 
of a given mixed graph G that are induced by ancestral subsets of vertices. This 
will extend the applicability of the HTC in the case of acyclic mixed graphs after 
applying the decomposition techniques of Tian (2005). We leverage this extension 
in an efficient algorithmic form. In Section 4, we report on computational experi¬ 
ments demonstrating the applicability of our findings. A brief conclusion is given 
in Section 5. 


2. Preliminaries 

We assume that the reader is familiar with graphical representations of structural 
equation models and thus only provide a quick review of these topics. For a more 
in-depth treatment see, for instance, Pearl (2009) or Foygel et al. (2012a). 

2.1. Mixed Graphs. For any n > 1, let [n] := {1, We define a mixed graph 

to be a triple G = {V,D,B) where V = [n] is a finite set of vertices and D,B C 
V X V. The sets D and B correspond to the directed and the bidirected edges, 
respectively. When (n, w) € D, we will write v ^ w G G and if (n, w) G B then we 
will write v w G G. Since edges in B are bidirected the set B is symmetric, that 
is, we have (w,rc) G B -4=^ {w,v) G B. We require that both the directed part 
{V,D) and bidirected part {V,B) contain no self loops so that {v,v) ^ D \J B for 
all V G V. If the directed graph (V, D) does not contain any cycles, so that there 
are no vertices v,Wi, G V such that v ^ Wi,Wi W 2 , v G G, then 

we say that G is acyclic, note, in particular, that G being acyclic does not imply 
{V, B) does not contain any (undirected) cycles. 

A path from n to ru is any sequence of edges from D or B beginning at v and 
ending with w, here edges need not obey direction and loops are allowed. A directed 
path from to ru is then any path from v to w all of whose edges are directed and 
pointed in the same direction, away from v and towards w. Finally, a trek tt from 
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a source z; to a target w is any path that has no colliding arrowheads, that is, tt 
must be of the form 


V, <r- V. 


1-1 


<r- ... ■<— Un 






or 


V[ <r- Vtl ^ ^ ^ 


uf 


..R 


where vf" = v, = w, and we call the top node. If tt is as in the first case then 
we let Left(7r) = {uq , and Right(7r) = {vq, ...,v^}, if tt is as in the second 

case then we let Left(7r) = {u^, uf,uf} and Right(7r) = {u^, uf,u,?}. Note 
that, in the second case, is included in both Left(7r) and Right(7r). A trek tt is 
called a half-trek if |Left(7r)| = 1 so that tt is of the form 


uf 


Vr-1 


.,R 


or 

->• uf ... 

It will be useful to reference the local neighborhood structure of the graph. For 
this purpose, for all v €V, we define the two sets 

(2.1) pa(v) = {w GV : w ^ V G G}, 

(2.2) sib{v) = {w gV : w V G G}. 

The former comprises the parents of v, and the latter contains the siblings of v. 

We associate a mixed graph G to a linear structural equation model as follows. 
Let be the set of real n x n matrices A = {Xvw) with support D, i.e., ^ 

0 (u, w) G D. Let PDn be the cone oi n x n positive definite matrices 

n = (ojyyj). Define PD{B) C PD^ to be the subset of positive definite matrices 
with support B, i.e. for v ^ w, uiym 0 => v GG w G G. 

In this paper, we focus on acyclic graphs G. If G is acyclic then the matrix I — A 
is invertible for all A G In other words, the equation system from (1.1) can 
always be solved uniquely for X. We are led to the following definition. 

Definition 2.1. The linear structural equation model given by an acyclic mixed 
graph G = {V,D,B) with V = [n] is the collection of all n-dimensional normal 
distributions with covariance matrix 

S = (J-A)-^D(/-A)-i 
for a choice of A G and fl G PD{B). 

2.2. Prior Work and the HTC. For a fixed acyclic mixed graph G, let 0 := 
M.^ X PD{B) be the parameter space and (fo : Q ^ PDn be the map 

(2.3) (l>G-{A,n)^{I-A)-^n{I-A)-\ 

Then the question of identifiability is equivalent to asking whether the fiber 

P{A,n) := 

equals the singleton {(A, D)}. We note that the above notions are well-defined also 
when G is not acyclic but, in that case, should be restricted to contain only 
matrices A with I — A invertible. 

When J^(A, D) = {(A, D)} for all (A, 11) G 0, so that (j)G is injective on 0, then 
G is said to be globally identifiable. Global identifiability is, however, often too 
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strong a condition. So-called instrumental variable problems, for instance, give rise 
to graphs G that would not be globally identifiable but for which the set of (A, fl) 
on which identifiability fails has measure zero; see the example in the introduction 
of Foygel et al. (2012a). Instead, we will be concerned with the question of generic 
identifiability. 

Definition 2.2. A mixed graph G is said to be generically identifiable if there 
exists a proper algebraic subset A C 0 so that G is identifiable on 0 \ A. 

Here, as usual, an algebraic set is defined as the zero-set of a collection of polyno¬ 
mials. We again refer the reader to the introduction of Foygel et al. (2012a) for an 
in-depth exposition on why generic identifiability is an often appropriate weakening 
of global identifiability. 

Now there will be cases when we are interested in understanding the generic 
identifiability of certain coefficients of a mixed graph G rather than all coefficients 
simultaneously. In these cases we say that the coefficient A„„ (or a;„„), for u, u S H, 
is generically identifiable in G if the projection of the fiber T{A, H) onto (or 
ujyu) is a singleton for all 0 \ A where A C 0 is a proper algebraic set. 

Let A and Q be matrices of indeterminates as in Equation (1.2) with zero 
pattern corresponding to G. Then, by the Trek Rule of Wright (1921), see also 
Spirtes, Glymour, and Schemes (2000), the covariance Tiyw can be represented as a 
sum of monomials corresponding to treks between v and w in G. To state the Trek 
Rule formally, let T{v,w) be the set of all treks from u to w in G. Then for any 
TT S T(u, w), if TT contains no bidirected edge and has top node z, we define the trek 
monomial as 

7r(A, H) = ajzz Aa,^, 

X—¥y^7T 

and if it contains a bidirected edge connecting u, z G V then we define the trek 
monomial as 

7r(A, H) = iOuz Aa,y 

X—¥y^-K 

We may then state the rule as follows. 

Proposition 2.3 (Trek Rule). For all v^w G V, the covariance matrix T, = {I — 
A)“^H(/ — A)“^ corresponding to a mixed graph G satisfies 

^vw — ^ ^ 7r(A,H). 

'7rG7~(v,w) 

Before giving the statement of the HTC, we must first define what is meant by a 
half-trek system. Let H = {tti, ..., tt^} be a collection of m treks with each having 
source Xi and target yi, then H is called a system of treks from X = {a;i, 
to T = {yi, if |Ai| = \Y\ = m so that all sources as well as all targets are 

pairwise distinct. If each is a half-trek, then H is a system of half-treks. Moreover, 
a collection H = {tti, ...,71^} of treks is said to have no sided intersection if 

Left(7ri) n Left(7rj) = 0 = Right (tt^) n Right (ttj), Vj ^ j 

Let htrfv) be the collection of vertices w G V \ ({u} U sib{v)) for which there is 
a half-trek from v to w, these w are called half-trek reachable from v. We have the 
following definition and result of Foygel et al. (2012a). 
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Definition 2.4. A set of nodes Y <ZV satisfies the half-trek criterion with respect 
to a node v & V ii 

(i) |r| = \pa{v)\, 

(ii) Y n ({i;} U sib{v)) = 0, and 

(iii) there is a system of half-treks with no sided intersection from Y to pa{v). 

Theorem 2.5 (HTC-identifiability). Let (Yy : v G V) be a family of subsets of 
the vertex set V of a mixed graph G. If, for each node v, the set Yy satisfies the 
half-trek criterion with respect to v, and there is a total ordering -< on the vertex 
set V such that w -< v whenever w GYyfi htr{v), then G is rationally identifiable. 

The assertion that G is rationally identifiable means that the inverse map 
can be represented as a rational function on 0 \ A where A is some proper algebraic 
subset of 0. Clearly, rational identifiability is a stronger condition than generic 
identifiability. If a graph G satisfies Theorem 2.5 we will say that G is HTC- 
identifiable (HTCf). In a similar vein, Theorem 2 of Foygel et al. (2012a) gives 
sufficient conditions for a graph G to be generically unidentifiable (with generically 
infinite fibers of and we will call such graphs HTC-unidentifiable (HTCU). 
Graphs that are neither HTCf or HTCU are called HTC-inconclusive, these are the 
graphs on which progress is left to be made. 

As is noted in Section 8 in Foygel et al. (2012a) we may extend the power of the 
HTC by using the graph decomposition techniques of Tian (2005). Let Ci,...,Gk C 
V be the unique partitioning of V where v,w G Gi if and only if there exists a 
(possibly empty) path from v to w composed of only bidirected edges. In other 
words, Cl,..., Cfe are the connected components of (U, B), the bidirected part of G. 
For z = 1,..., fc, let 


Vi = Gi Upa ( Gi ), 

Bi = {v w G G : V, w G Ci } 


Di — 'I'c —y w G G v G V), w G Ci \, 
G. = {Vi,D,,B,). 


Then the mixed graphs Gi,..., Gk are called the mixed components of G. From the 
work of Tian (2005), Foygel et al. (2012a) present the following theorem. 

Theorem 2.6 (Tian Decomposition). For an acyclic mixed graph G with mixed 
components Gi,..., Gk, the following holds: 

(i) G is rationally (or generically) identifiable if and only if all components 
Gi,...,Gfc are rationally (or generically) identifiable; 

(ii) G is generically infinite-to-one if and only if there exists a component Gj that 
is generically infinite-to-one; 

(iii) if each Gj is generically hj-to-one with hj < oo, then G is generically h-to-one 
with h = 0^=1 ^j- 

We remark that this decomposition also plays a role in non-linear models; see, 
for instance, the paper of Shpitser et al. (2014) and the references given therein. 


3. Ancestral Decomposition 


For a later strengthening of the HTC, we will show that the generic identification 
of certain subgraphs of an acyclic mixed graph G = {V,D,B) implies the generic 
identification of their associated edge coefficients in the larger graph G. This result 
is straightforward and is well known in other forms. Surprisingly, however, this 
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simple idea can extend the applicability of the HTC when combined with the de¬ 
composition from Theorem 2.6. We will first define what we mean by an ancestral 
subset and an induced graph, 

Definition 3.1. Let V dV he a, subset of vertices. The ancestors ofV form the 
set 

An{V') = {v &V ■. there exists a directed path from v to 

where we consider the empty path to be directed so that V 
An{V'), then we call V' ancestral. 

Definition 3.2. Let V' C V he again a subset of vertices. 
induced by V is the mixed graph Gy' = {V, D', B') with 

D' = {v ^ w G G : v,w € V'}, 

B' = {v w € G : v,w G V'}. 

We now have the following simple fact. 

Theorem 3.3. Let G = (V, D, B) be a mixed graph, and let V be an ancestral 
subset of V. If the induced subgraph Gy' is generically (or rationally) identifiable 
then so are all the corresponding edge coefficients in G. 

Proof. Let the covariance matrix E = (J — A)“^n(/ — A)“^ correspond to G, 
that is, A G and LI G PD[B). Let A' and LI' denote the V x V submatrices of 
A and LI, respectively, and let E' = {I\y>\ — A')~'^Ll{fyr\ — A')“^ where I^y^ is the 
\V'\ X \V'\ identity matrix. For ease of notation, write G' = Gy. 

Recall that for any v,w gV , the set T(v, w) comprises all treks between v and 
w in G. Similarly, write 7Gi(y,w) for the set of treks between v and w in G'. 
Since V is ancestral, it holds that T{v,w) = Tg'{v,w) for all v,w G V. Thus, by 
Proposition 2.3, we have that for any v,w gV 

E,„= ^ 7r(A,ff)= ^ ^{A',Ll') = Ku,- 

Now suppose that G' is generically (or rationally) identifiable. Then A',Ll' can 
be generically (or rationally) recovered from E'. As we have just shown that E„u; = 
EJ,^ for all u, w G P', we have that the entries of A, fl corresponding to A', can 
be recovered from E generically (or rationally). □ 

We may generalize the above theorem so that we do not have to consider the 
identifiability of all of G' and instead only look at certain edges in G'. 

Corollary 3.4. Let G = (V, D, B) be a mixed graph, and let V' be an ancestral 
subset of V. If an edge coefficient of Gy is generically (or rationally) identifiable 
then so is the corresponding coefficient in G. 

Proof This follows exactly as in the proof of Theorem 3.3 by only considering 
a single generically (or rationally) identifiable coefficient of G' at a time. □ 

We give an example as to how Theorem 3.3 strengthens the HTC. 

Example 3.5. It is straightforward to check that the graph G from Figure la 
is HTC-inconclusive using Algorithm 1 from Foygel et al. (2012a). We direct the 
reader who does not want to perform this computation by hand to the R package 


some w G V'}, 

C AniV). If V' = 

The subgraph of G 
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(a) An HTC-inconclusive graph G (b) The induced subgraph G{i^ 2 , 3 . 4 , 5 } 



(c) The mixed components of G{i 2 , 3 , 4 ,s} 


Figure 1. A mixed graph G, a subgraph induced by an ancestral 
subset, and the mixed components the induced graph. 

SEMID (Foygel and Drton, 2013; R Core Team, 2014). Moreover, G cannot be 
decomposed as its bidirected part is connected. 

Now the set F' = {1, 2, 3,4, 5} is ancestral in G, so we may apply Theorem 3.3 
to the induced subgraph G' = Gi, 2 , 3 . 4 , 5 - While G' remains HTC-inconclusive, the 
Tian decomposition of Theorem 2.6 is applicable. After decomposing G' into its 
mixed components, see Figure Ic, we find that each component is HTC-identifiable 
and thus G' itself is generically identifiable. To show generic identifiability of G, we 
are left to show that all the coefficients on the directed edges between pa{6) = {1,2} 
and 6 are generically identifiable. Since Y = {3,4} satisfies the HTC with respect 
to 6 it follows, by Lemma 3.6 below, that „ is generically identifiable. Hence, 

the entire matrix A is generically identifiable, and since (/ —A)^E(J —A) = H, this 
implies generic identifiability of H. We conclude that G is generically identifiable 
despite being HTC-inconclusive. 

Lemma 3.6. Let G = (F, D, B) be a mixed graph, and let v € V. IfY C F satisfies 
the HTC with respeet to v and for each y € Y we have that h-pa{y),y is generieally 
(or rationally) identifiable, then Apa(v),v is generically (or rationally) identifiable. 

Proof. Suppose vertex v has m parents, and let pa{v) = {pi, ...,pm}- Since Y 
satisfies the HTC for v, we must have |F| = |pa(u)| = m. Thus we may enumerate 
the set as F = {j/i,..., ?/m}- Define a matrix A € with entries 



[(/ - A)'^T]y.p. if yi e htr{v), 
Yy.p. if yi ^ htr{v), 
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and define a vector b € R™ with entries 

_ J [(/ - if e htr{v), 

* ■“ \ Y,y.y if yi ^ htr{v). 

Both A and b are generically identifiable because we have assumed that 
is generically identifiable for every y € Y. Now, from the proof of Theorem 1 in 
Foygel et al. (2012a), we have A ■ Apa(v),v = b, and from Lemma 2 of Foygel et al. 
(2012a) we deduce that A is generically invertible. It follows that = A~^b 

generically so that Ap(j(^) j, is generically identifiable. □ 

Algorithm 1 from Foygel et al. (2012a) (hereafter called the HTC-algorithm) 
determines whether or not a mixed graph G = (V, D, B) satisfies the conditions 
of Theorem 2.5 and and thus checks if the graph is HTCl. The HTC-algorithm 
operates by iteratively looping through the nodes v G V and attempting to find a 
half-trek system H to pa{v) with H having sources which are in a set of “allowed” 
nodes A C V. Here, a node w is allowed if w ^ htr{v) U {u} U sib{v) or if w was 
previously shown to be generically identifiable in the sense that all coefficients on 
directed edges u ^ w were shown to be generically identifiable. If such a half-trek 
system H is found for node v then Foygel et al. (2012a) show that this implies that 
V is generically identifiable, and thus v may be added to the set of allowed nodes 
for the remaining iterations. The algorithm terminates when all nodes have been 
shown to be generically identifiable or once it has iterated through all vertices and 
has been unable to show the generic identifiability of any new nodes. To find a 
half-trek system between a suitable subset of the set of allowed nodes A and pa{v), 
the HTC-algorithm solves a Max Flow problem on an auxiliary network G'flow(A, v), 
and this step takes 0(|yp) time when G is acyclic. If in Gflow(A, u) one can find 
a flow of size |_pa(v)| then the half-trek system exists. See Section 6 of Foygel et al. 
(2012a) for more details about how Gflow(A,'u) is defined. Finally, for an acyclic 
mixed graph G, the HTC-algorithm has a worst case running time of 0{\V\^). 

Algorithm 1 presents a simple modification of the HTC-algorithm to leverage 
Corollary 3.4 extending the ability of the HTC to determine the generic identifi¬ 
ability of acyclic mixed graphs. We emphasize that this algorithm considers only 
certain ancestral subsets and, as such, we do not necessarily expect the algorithm 
to reach a conclusion in all cases in which Corollary 3.4 may be applicable. 

Proposition 3.7. Algorithm 1 returns “yes” only if the input acyclic mixed graph 
G is generically identifiable and will return “yes” whenever the HTC-algorithm does. 
Moreover, Algorithm 1 returns “yes” for the HTC-inconclusive graph in Figure la 
and has time complexity at most 0{\V\^). 

Proof. The fact that Algorithm 1 only returns “yes” if G is generically iden¬ 
tifiable follows from Theorem 7 in Foygel et al. (2012a,b) and our Corollary 3.4. 
That Algorithm 1 returns “yes” whenever the HTC-algorithm does can be argued 
as follows: 

If, for a set of allowed nodes A and v G V, there exists Y GL A satisfying the 
HTC for V then we must have that Y C S := An{v) U sib{An{v)) and Y satisfies 
the HTC for v in G^„({„}u(AnS))- Lemma 4 of Foygel et al. (2012b) then yields 
that there exists a set Y' of allowed nodes which satisfies the HTC for v in the 
mixed component of Gyi„({„}u(AnS)) containing v. Hence, if v is added to the set of 
solved nodes in the HTC-algorithm it will also be added to the set of solved nodes 
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Algorithm 1 A sufScient test for generic identifiability. 

1: Input: G = {V, D, B), an acyclic mixed graph on n nodes vi,... ,Vn 
2: Initialize: SolvedNodes ■(— {■(; : pa(v) = 0}. 

3: repeat 

4: for V = Vl,V 2 , ■ ■ ■ ,Vn do 

5: if n ^ SolvedNodes then 

6: > Check if we can generically identify using 

7: > the induced graph GA„({„}u(AnS)- 

8: S' An{v) U sih{An{v)) 

9: A ■(— S n (SolvedNodes U (fo \ htr{v))) \ ({n} U sib{v)) 

10: G' <— the mixed component of G^„({„}u(AnS)) containing v 

11: A ■«— (A n (vertices in G')) U (source nodes in G') 

12: if MaxFlow(Gflow(i', A)) = \pa{v)\ then 

13: SolvedNode SolvedNodes U {n}. 

14: Skip to next iteration of the loop 

15: end if 

16: > Check if we can generically identify „ using 

17: > the induced graph GAn{{v})- 

18: G' •<— the mixed component of GAn(v) containing v 

19: A •(— (A n (vertices of G')) U (source nodes in G') 

20: if MaxFlow(GgQ^(n, A)) = \pa{v)\ then 

21 : SolvedNode •<— SolvedNodes U {u}. 

22: end if 

23: end if 

24: end for 

25: until SolvedNodes = fo or no change has occurred in the last iteration. 
26: Output: “yes” if SolvedNodes = V, “no” otherwise. 


in Algorithm 1. From this it follows that if the HTC-algorithm outputs “yes” then 
so will Algorithm 1. 

It is straightforward to check that Algorithm 1 returns “yes” for the graph in 
Figure la and, thus, it remains only to argue that the time complexity is at most 
0{\V\^). Note that the Max Flow algorithm for this problem has a running time of 
0{\V\^) since G is acyclic, see Foygel et al. (2012a) for details. It is easy to see that 
this running time dominates on each iteration. Moreover, since at the end of each 
iteration through the \V\ nodes of the graph, the algorithm must either terminate 
or add at least one vertex to the set of solved nodes, it follows that there are at 
most I fop iterations. We conclude that the maximum run time of the algorithm is 
Oi\V\^= 0{\V\^). □ 


Remark 3.8. One might expect that lines 18 to 22 in Algorithm 1 are superfluous. 
This is, however, false, and indeed we have found examples of graphs G on 10 nodes 
for which Algorithm 1 returns “yes” but the corresponding algorithm with lines 18 
to 22 removed returns “no.” As these examples are fairly large we have chosen to 
not display them here. 
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Algorithm 2 A procedure for generating random acyclic mixed graphs. 

1: Input: A positive integer n and 0 < p,q < 1 

2: Initialize: A mixed graph G = (V, D, B) with V = {1, n}, D = B = 0 
3: Pick a random collection E of n — 1 bidirected edges so that (V, E) is a tree. 
i: B E 

5: for 1 < * < j < n do 

6: Add i j to B with probability p 

7: end for 

8: for 1 < i < j < n do 
9: Add i ^ j to D with probability q 

10: end for 
11: Output: G 


4. Computational Experiments 

We now run a simulation study to examine the effect of applying Algorithm 1 to 
HTC-inconclusive graphs. All code is written in R, and we use the SEMID package 
to determine HTC-identifiability and HTC-unidentifiability (R Core Team, 2014; 
Foygel and Drton, 2013). For each combination of n G {6,8,10,12}, p G {.1, .2, .3}, 
and q G {.2, .3, .4, .5, .6} we perform the following steps: 

(i) Use Algorithm 2 with probability parameters p and q to generate random 
acyclic mixed graphs G with connected bidirected part on n nodes, until we 
have found 1000 graphs which are HTC-inconclusive. 

(ii) For each of the 1000 HTC-inconclusive graphs G, use Algorithm 1 to test the 
generic identifiability of G. 

(iii) Record the proportion of the 1000 graphs that are shown to be generically 
identifiable by Algorithm 1. Call this proportion an,p,q- 

To summarize our findings we compute, for each pair (n,p), the average bn^q = 
^J2p^n,p,q- We then plot the values of bn,q in Figure 2. According to this figure. 
Algorithm 1 provides a modest increase in the number of graphs that are generically 
identifiable. This improvement is seen to be largest when q is large, that is, the 
directed part of the mixed graph is dense. 

5. Conclusion 

We have shown how the generic identifiability of a subgraph of a mixed graph 
G induced by an ancestral subset of vertices implies the generic identifiability of 
the corresponding edge coefficients in G (Theorem 3.3 and Corollary 3.4). We then 
provided, in Algorithm 1, one specific way of how to leverage this result by using the 
HTC of Foygel et al. {2012a) and the decomposition techniques of Tian (2005). Our 
new algorithm provides a modest strengthening of the HTC while not increasing 
the algorithmic complexity of the HTC-algorithm of Foygel et al. (2012a). 

When saying above that Algorithm 1 constitutes one way of leveraging, we mean 
that the algorithm considers only certain ancestral subsets. While we do not have 
any examples to report, it is possible that there are acyclic mixed graphs for which 
Algorithm 1 does not return “yes” but which could be proven generically identifiable 
by a combination of Corollary 3.4, the Tian decomposition, and the HTC-algorithm. 
This said, it is not clear to us that all ancestral subsets can be considered in an 
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Figure 2. The average proportion of HTC-inconclusive graphs 
found to be generically identifiable by Algorithm 1. 


algorithm with polynomial run time. Clarifying this issue would be an interesting 
topic for future work. 
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